Information and fitness 
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The growth rate of organisms depends both on external conditions and on internal states, such as 
the expression levels of various genes. We show that to achieve a criterion mean growth rate over an 
ensemble of conditions, the internal variables must carry a minimum number of bits of information 
about those conditions. Evolutionary competition thus can select for cellular mechanisms that are 
more efficient in an abstract, information theoretic sense. Estimates based on recent experiments 
suggest that the minimum information required for reasonable growth rates is close to the maximum 
information that can be conveyed through biologically realistic regulatory mechanisms. These ideas 
are applicable most directly to unicellular organisms, but there are analogies to problems in higher 
organisms, and we suggest new experiments for both cases. 



Since Shannon's original work [HH] there has been the 
hope that information theory would provide not only a 

guide to the design of engineered communication systems 
but also a framework for understanding information pro- 
cessing in biological systems |3] HJ [51 |SJ [7]. But there 
are (at least) two major obstacles in the way of any ef- 
fort to use information theoretic ideas in the analysis of 
biological systems. First, Shannon's formulation of in- 
formation theory has no place for the value or meaning 
of the information [S], yet surely organisms find some 
bits more valuable than others. Second, it is difficult 
to imagine that evolution can select for abstract quanti- 
ties such as the number of bits that an organism extracts 
from its environment. Both of these problems point away 
from general mathematical structures toward biological 
details such as the fitness or adaptive value of particular 
actions, the costs of particular errors, and the resources 
needed to carry out specific computations. 

The question of whether abstract information theo- 
retic quantities can be connected to concrete costs and 
benefits is not new, nor is it specific to the biological 
context. Fifty years ago, Kelly asked whether Shannon's 
definition of information has a meaning outside the stan- 
dard model of communication, and he showed that in 
simple models of gambling the rate at which one's win- 
nings accumulate is bounded by the information (in bits) 
that one has about the outcome of the game [S] . Kelly's 
results generalize to the slightly more dignified setting of 
portfolio management [10 , and closely related ideas have 
emerged recently in thinking about phenotypic switching 
in bacteria [TT, . What these examples have in com- 
mon is that the benefit or growth (of investments, or of 
a bacterial population) is a linear function of the control 
parameters (the fraction of the portfolio invested in each 
stock, or the fraction of organisms adopting a particular 
phenotype). This linear framework is too restrictive, but 
Kelly's classical results encourage us to think that there 
may be some more general relationship between the in- 



formation that an organism has about its environment 
and its growth rate or fitness. 

To be concrete, we consider single celled organisms 
in quasi-static environments, and discuss generalizations 
below. A bacterium lives in an environment described 
by a set of variables s = si, S2, ■ ■ ■ , sk', in the simplest 
case, just one relevant environmental variable s might 
specify the concentration of some limiting nutrient. The 
fitness of the organism does not depend just on these en- 
vironmental variables, but also on internal variables such 
as the expression levels of different enzymes involved in 
the metabolism of the available nutrients. Let's refer to 
these variables as g = gi, g2, ■ ■ ■ , go, said then the fitness 
of any particular organism in its environment is defined 
by some function f{g, s) [13]. This fitness function could 
be complicated — there are benefits to be gained from 
metabolizing particular nutrients, but achieving these 
benefits requires the appropriate expression levels of the 
relevant enzymes, and the expression of the proteins is 
itself a cost that lowers fitness. Recent experiments at- 
tempt to map these different factors for the case of the 
lac operon in E coli |14j . resulting in an estimate of the 
fitness as a function of the environmental concentration 
of lactose (s) and the expression level of the lac proteins 
(g) , shown in Fig [l] The important point is perhaps not 
the detailed form found in particular experiments, but 
that we can imagine writing the fitness as depending on 
a combination of environmental and internal variables. 

For any given set of environmental conditions there is 
some setting of the internal variables that provides for 
the maximum fitness. If the organism could find this 
optimal operating point, then its internal state would be 
perfectly matched to the state of the environment. Even 
if the system does not find this optimum, we can still 
think of the internal state as representing what the or- 
ganism "knows" about the environmental variables. To 
quantify this knowledge, we imagine that the organism 
will encounter, over its lifetime, a distribution P(s) of 
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FIG. 1: Growth rate of E coli as a function of external sugar 
(lactose) concentration and the expression level of one gene 
(lacZ), as estimated in Ref [Tl]. and summarized in their 
Eq (5). Fitness is measured as a fractional difference from 
the growth rate when both the lactose concentration and lacZ 
expression levels are zero. Sugar concentration is measured in 
units such that the half maximal benefit is reached at s = 1, 
and expression level is measured in units of the maximum 
that the cell can maintain. White line traces the optimal 
expression level for each value of s. 



This is a linear function of the conditional distribution 
P{g\s), while the information I{g\ s) is a convex function 
of the conditional distribution [TU]. Thus, if we consider 
all conditional distributions that lead to the same aver- 
age fitness, then there is one which corresponds to the 
minimum amount of information; cf Fig[2j The relation- 
ship between this minimal information and the mean fit- 
ness, Imin((/)), is analogous to the rate-distortion func- 
tion in communication theory [10]. We can also phrase 
this relation as /max(^), the maximum mean fitness that 
can be achieved given a certain amount of information. 

The existence of the function /min((/)) means that if 
organisms are to achieve a certain average level of fitness 
as they experience different environments, then the in- 
ternal state of the organism g must provide a minimum 
amount of information about the relevant variables in 
the environment. In this precise sense, achieving a crite- 
rion level of fitness requires a minimum number of bits. 
If evolution selects for greater fitness — as it does, almost 
by definition — then this selection continually raises the 
minimum number of bits that organisms need to repre- 
sent about their environment. Contrary to a widely held 
intuition, then, evolution does select for an abstract, in- 
formation theoretic property. 

The optimization problem in which we minimize the 
information I{g; s) at some fixed average fitness (/) has 



environmental conditions. Given the state of the en- 
vironment, organisms will adjust their internal state as 
best they can, but unless this process were (implausibly) 
noiseless, the result of the adjustment will be that the 
internal states are drawn from some probability dsitri- 
bution P{cj\s). Thus if we were to take a snapshot, we 
would find individual cells with internal states g and 
their environments s drawn from the joint probability 
distribution P{g,s) = P{g\s)P{s). Shannon then tells 
us that the internal state g provides information about 
the environment s. and this information is 
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The question is how this information content of the in- 
ternal states relates to the fitness. 

Given the joint distribution of internal and external 
states, P{g,s), the average fitness over the organisms' 
experience in a distribution of environments is 

(/> = Jd'^sJ d^gP{g,s)f{g,s) (3) 
= Jd''sP{s)Jd^gP{g\s)m^- (4) 
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FIG. 2: Imagine mechanisms that tune the internal state of 
the organism in response to the state of the environment. 
Each possible mechanism achieves a certain average fitness 
over the lifetime of the organism. Depending on the preci- 
sion of these mechanisms, the internal state will provide some 
amount of information (in bits) about the state of environ- 
ment. Thus, each possible mechanism corresponds to a point 
in the plane relating the mean fitness (/) to the information 
I{g;s). Not all points in this plane are physically possible: 
there is a curve 7min({/)) that separates the allowed from the 
disallowed possibilities. The example shown here is calcu- 
lated for the fitness function in Fig[l] with a distribution of 
external states P{s) oc exp(— 2s). 
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a simple formal solution, 



Z{s) 



(5) 



where A is a Lagrange multiplier that fixes the average 
fitness, and Z(s) serves to normalize each of the distri- 
butions P(g\s). The exponential of the fitness reminds 
us of the Boltzmann distribution, and one can think of 
A as being like an inverse temperature in statistical me- 
chanics, biasing the distributions toward expression lev- 
els that insure higher fitness (lower energy) when A is 
larger (temperature is lower). Equation (|5| doesn't com- 
pletely solve the problem because one must enforce con- 
sistency between P{g) on the right hand side and P{g\s) 
on the left; that is, one must satisfy both Eq's (|5| and 
([2]). Fortunately these two equations can be combined 
into an iterative algorithm that converges |15j . 

In Fig[2]we show the results of a numerical compuation 
of the function /max(-^), based on the fitness function in 
Fig [1] The precise results depend on the choice of the 
distribution of environmental conditions P{s), but we 
have found that the scale and basic form of the function 
/max(^) E^re relatively robust so long as the distribution 
spans the range of sugar concentrations over which the 
optimal expressions levels actually vary. 

The rate-distortion, or information-fitness function 
^min((/)) is rather smooth and featureless. Nonetheless, 
careful examination of the results illustrates several dif- 
ferent points. A significant fitness advantage (~ 1%, 
compared with the maximum possible 1.6% across this 
ensemble of conditions) can be obtained by adjusting 
the gene expression level in ways that carry almost no 
information about the external world. The distribution 
of expression levels under these conditions is still quite 
broad, corresponding to a population of organisms that 
uses (continuous) phenotypic diversity to survive under 
a range of conditions, but without a tight regulatory 
mechanism that links phenotype to the external world. 

An even larger fraction of the available fitness advan- 
tage is accessible through mechanisms that use relatively 
little information, less than one bit. On the other hand, 
squeezing out the last ~ 0.1% of fitness advantage re- 
quires pushing well past one bit of information. Put an- 
other way, organisms that could only implement a true 
switch-like control, in which expression is only "on" or 
"off," would be at a small but measurable disadvantage 
in growth rate when averaged over a wide range of con- 
ditions. Organisms that have access to more than one 
bit of information thus could out-compete their one-bit 
cousins over thousands of generations. 

We note that results based on the fitness function mea- 
sured in Ref |14j are necessarily conservative. Under 
the conditions studied in those experiments, an infinite 
supply of the external sugar leads only to a ~ 10% ad- 
vantage in growth rate over the case where the sugar is 



completely absent. If we were to consider the case of 
a truly limiting nutrient, the overall scale of the fitness 
variations, and hence the curve fmax{I), would thus be 
nearly ten times larger. Thus, the difference between one 
bit and two bits would be ~ 1% in growth rate, which 
can be selected for on quite short time scales. 

The scale of the information-fitness function also is 
interesting in comparison to what we know about the 
performance of real regulatory mechanisms [THl HZ] . We 
recall that, because expression levels have a limited dy- 
namic range and a finite amount of noise even under 
fixed conditions, the capacity of the expression level 
to convey information (about anything) is bounded. 
With realistic parameters, based on recent experiments 
[m [H EQl EH EH ESI El] , this capacity is less than three 
bits and more typically less than two bits [16] . Although 
we don't have enough data to reach a firm conclusion, 
these results certainly motivate the conjecture that the 
minimum information required to reach reasonable lev- 
els of fitness is close to the maximum information that 
can be passed through real genetic regulatory elements. 

The precise form of the information-fitness function 
depends on the function f(g, s), but the asymptotic be- 
havior at high mean fitness is more nearly universal. We 
can reach this limit by considering Eq ^ as the param- 
eter A becomes large. Then the distribution of expres- 
sion levels becomes sharply peaked around the optimum 
3opt(s) for each set of external conditions; the form of 
this distribution becomes approximately Gaussian with a 
width inversely proportional to A. Taking this Gaussian 
approximation seriously, it is straightforward to compute 
the information and the mean fitness; we find 
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where Iq is a constant independent of the mean fitness, 
(/)max is the maximum mean fitness obtainable by an 
organism that has perfect information about its envi- 
ronment, and • • • denotes terms which become relevant 
at lower fitness. The details of the function f{g, s) are 
buried in the constant Iq, but the way in which the min- 
imum information grows as the organism approaches its 
maximal mean fitness depends on the number of genes 
D the cell has to control, independent of details. 

We have assumed, for simplicity, that variations are 
slow, so we can write the fitness as a function of internal 
and external states at the same instant of time. A more 
realistic analysis would take account of the fact that cur- 
rent values of internal control variables interact with ex- 
ternal conditions in the future, so that the information 
which controls the achievable level of fitness is predictive 
information US] . We can also generalize to consider 
behaviors in complex multi-cellular organisms; the ana- 
log of the information-fitness relation then states that 
behaviors which collect some criterion level of reward 
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across an ensemble of conditions must be guided by neu- 
ral representations which carry a minimum amount of 
information about these conditions. 

It is possible to measure, in real time, both growth 
rates and expression levels of particular genes in indi- 
vidual unicellular organisms |27j . Repeating such exper- 
iments under varying external conditions should allow 
estimates of the fitness function f{g, s) with single cell 
resolution, the mean growth rate under given conditions, 
and the mutual information between internal and exter- 
nal variables. Thus we could locate the organism's per- 
formance in the information-fitness plane of Fig [2] and 
also see how close it comes to the limiting curve /max(-f)- 
For neural systems, if we have a motor control task in 
which there is a good model of the underlying mechanics 
[28], analogous experiments would compare the informa- 
tion available in central neural representations with the 
minimum required to achieve observed levels of reward 
under variable conditions. 

To summarize, achieving a criterion level of fitness or 
reward across a distribution of conditions always requires 
an internal representation of the world that captures 
some minimum number of bits. Qualitatively, this means 
that evolutionary competition will drive an increase in 
this information capacity. Quantitatively, in the case 
of unicellular organisms, the minimum information re- 
quired for reasonable levels of fitness is close to the max- 
imal information that can be transmitted through known 
genetic regulatory mechanisms. Finally, this general pic- 
ture suggests experiments which could map the informa- 
tion/fitness tradeoff in a wider variety of systems, and 
locate the performance of real organisms in relation to 
the information theoretic optimum. 
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